Up till now, what we used was vertex and pixel shader stages. However, there are many other stages available to create effects ! This tutorial will focus on how using the compute shader stage to take advantage of it and add a sepia filter on top of our rendering.
While this kind of effect can also be achieved through a PostProcessPass, using compute capabilities can help in leveraging full hardware performances. For instance using Dx12, in the long term, it will be possible to use the async compute functionality of the hardware. This tutorial builds on top of preceding tutorial, so be sure you already have an understanding on what has been done. Let's not wait more and dig into the subject !
Compute programs behave the same as any other program when reading data provided by the application. It is possible to give it a ConstantBuffer, a Texture, and so on. However, writing to a resource is slightly different.
In the compute stage, no target can be written to. In fact, nothing is rasterized, and as such, compute programs usually write manually to buffers that can be later interpreted by the application. In graphics API terms, this is an Unordered Access View (UAV) resource, which can be both read from and written to.
Within nkGraphics, we will use resources from the Buffer class. These buffers can hold binary data that can be freely interpreted by the application. They can also be prepared to be read and written from programs, which makes them the way to exchange data with compute shaders.
Let's first see how we can create such a resource. Includes are :
Then, we can create the buffer :
Like usual, we use the dedicated manager to allocate the resource, and keep track of it. Next step will be to set the buffer up. First, we will specify what kind of usage we will use it for :
The buffer will be used both for compute usage (aka writing, UAV), and shader resource usage (standard reading, like a texture). These two calls will allow the component to properly setup the buffer for what it's meant for. Next step, we need to setup the size of the buffer to use.
Here we will think in terms of image, as this program will be a filter. It will be processing pixels, over a 800x600 image. As such, we first set an element's byte size, corresponding to a pixel, depending on its format. Finally, the number of elements can be given.
To end this step, we trigger the loading of the buffer to make it allocate all rendering resources necessary.
Once the buffer is ready, we can go to next resource required : the texture that will receive the temporary rendering for the filter to read from. This texture will be a render target that uses the size of the window, and will be referenced as the color target in a TargetOperations we will detail later.
We require a little bit more setup than simply loading the texture from a file. As it is manually created, we need to specify some info so that nkGraphics knows what to create. We set its size, aligned on the window. Next it to ensure the format used is the one we expect within the rendering pipeline. An important step is to flag when textures are supposed to be render targets. This is true also for targets used for depth. Once all is setup, we load the texture, and make it ready to be used.
Once the resources are ready, we will need to setup our new effect. We will require 2 new shaders : a compute one processing the data, and a post process one to copy back the data into our rendering surface.
Let's begin by setting up the compute shader. First, it will require its program :
We create the program through the manager, and prepare its source before triggering the load. The sources are this time only feeding one stage : the compute one. The program will sort out what it needs to do when loading itself, based on the sources provided.
The program here is pretty straightforward, once we get past the new syntax introduced for compute shading. We have a constant buffer, which will receive target information so that we know which pixels we have to process. Then, we get the structure describing how we will store the data we compute.
This data is accessed through the RWStructuredBuffer primitive, registered to a uav slot. This structure allows to read and write to the buffer attached. This buffer will be formed by unsigned integers in which we will store our colors, packed as R8G8B8A8.
Next comes the texture we will read from, and the program main function. The line just before means we will spawn 32x32x1 threads groups. Let's keep that in mind for later as this will control the number of groups we will want to spawn.
Idea for the main function is that one thread will process one pixel, so we need to find back, from the thread's IDs, which pixel it will correspond to. It first checks whether our thread ID is still within the texture's boundary. We would not want to touch a value outside of the buffer. After the check, it will load the color and apply the sepia filter.
Final step is probably one we could avoid by having floats rather than compacting the data, but I found out it was interesting both for the reader and the writer (me :D) to take a look at the data packing. We use bit-wise operations to shift data around and get a representation on a uint of a pixel, each channel being on its own byte. We can then store the value in the buffer, at the index we found out for a given thread.
Now that the program is ready, we need to create the shader that will use it.
Story is the same as usual. Create the shader, attach its program and then specify how resources will be fed. While this takes back a lot of what we've done up till now, you might notice a new way of specifying the buffer, via the addUavBuffer call. This function will add a buffer supposed to be written to. From an HLSL point of view, these resources will be linked to the UAV slots, which is where our RWStructuredBuffer is.
This covers the filter shader. Now we need another step in our shader chain, the last one copying back the buffer into the final rendering surface. Let's see what its program looks like :
This program will be used as a post process shader. As such, the vertex stage is simply passing variables to the pixel stage. Then, we basically do the opposite of the compute shader : try to match a pixel to a thread index. We unpack the color, and paste it to the render target !
Leaving us with the shader to create :
There is nothing really new here, apart from the way the buffer is fed to the program. We now feed it as a texture, so it will be bound to a texture slot. This is because our pixel stage only needs to read the buffer. As such, having it bound as a texture resource is a good way to access it.
And this justifies why we requested the buffer to be prepared both for compute (UAV) and shader resource (texture) work. This way, we ensured everything was ready for us to use it in both cases.
And with that, we have all the building blocks we need to prepare our compositor.
Our compositor will need to be changed a bit here. The idea will be to :
Which results in a code like :
First, get the compositor, add a node inside, and declare the first target operation, which will render to our render target we created earlier. Passes are like the ones described above, clear, render scene, post process, and compute. The compute pass will run the filter shader we declared earlier.
One important bit is to remember that our shader will spawn groups of 32x32x1 threads. We need to compute 800x600 pixels, which means that on the X axis we need at least 25x32 threads, while one the Y axis we need at least 19x32 threads. This is the purpose of setting the X and Y of the pass, controlling how many groups we need to spawn on both axis.
Do note that if the texture was of different size, this means we would need to spawn less or more threads. Finding the balance between the number of threads per group and number of group will often depend on the problem and the load put by the shader on the hardware. With nkGraphics, you have all the openings to try different setup and see what is more efficient for you.
Final step, we setup a new target to render to, and copy the buffer inside. That's it for our compositor !
If we now launch the program, we should now witness a new feel for our rendering :
And with this, we now witnessed a bit more how compositors can be leveraged to augment the rendering, along with a new pass type : the compute pass. Many things can be run in this general purpose stage, like the sepia filter we did now. However, it can be used for many other purposes : generating a density field for your marching cube algorithm, intersection / visibility checks... You now have an idea on how that can be leveraged within nkGraphics.
This tutorial is now done, I hope you enjoyed it. But nkGraphics having more capabilities, this is not the end of the tutorial series... Hang on tight for the next ones !